Transport Usage¶
In order to facilitate running on remote machines, remotemanager uses a file sending system internally referred to as Transport
. For most use cases, you will not need to interface with these structures, however you may find their functions helpful for controlling files.
Transport types¶
Transport
itself is not a useful structure, and you won’t get very far by using it in its raw form. It exists to give a common set of methods to all subclasses. The primary subclass is transport.rsync
[1]:
# dev note: be careful editing this tutorial,
# it's _very_ sensitive to files and folders already existing,
# they must be cleared prior to any run, else it will cause the CI to fail
from remotemanager.transport import rsync
Transport functions with a queue system, and holds a concept of push
and pull
. First, initialise your transport class with the arguments that you would like rsync to use
[2]:
tr = rsync(flags='auv')
To actually transfer files, you must first queue them. Transport entities consider your current machine as the local or origin point, and the destination as the remote or target point. First, lets create some folders and files for demonstration:
[3]:
from remotemanager import URL
url = URL()
url.cmd('rm -r temp_trn_local', raise_errors=False)
url.cmd('rm -r temp_trn_remote', raise_errors=False)
url.utils.mkdir('temp_trn_local')
url.utils.mkdir('temp_trn_remote')
url.utils.touch('temp_trn_local/send_me')
url.utils.touch('temp_trn_local/send_me_also')
url.utils.touch('temp_trn_remote/fetch_me')
url.utils.touch('temp_trn_remote/fetch_me_too')
url.utils.touch('temp_trn_remote/fetch_me_differently')
[3]:
[4]:
print(url.utils.ls('temp_trn_local'))
['send_me', 'send_me_also']
[5]:
print(url.utils.ls('temp_trn_remote'))
['fetch_me', 'fetch_me_differently', 'fetch_me_too']
Now we have 2 files on our “local” machine we want to send, and also 3 files on our “remote” machine that we need to fetch. Lets start with pushing. To do this, we need to use the method queue_for_push
This takes the format of files
, local
, remote
:
[6]:
tr.queue_for_push(['send_me', 'send_me_also'], 'temp_trn_local', 'temp_trn_remote')
With this done, we can see the transferrs that are ready to occur. Either by accessing the transfers
property, or using the print_transfers
method, which formats it for you
[7]:
tr.transfers
[7]:
{'/home/test/remotemanager/docs/source/tutorials/temp_trn_local/>temp_trn_remote/': ['send_me',
'send_me_also']}
[8]:
tr.print_transfers()
transfer 1:
origin: /home/test/remotemanager/docs/source/tutorials/temp_trn_local/
target: temp_trn_remote/
(1/2) send_me
(2/2) send_me_also
Here we can see a single transfer that is ready to occur, which represents one rsync call. Before executing, we can see the commands to be executed by calling the transfer
method with dry_run=True
[9]:
tr.transfer(dry_run=True)
[9]:
[rsync -auv --checksum /home/test/remotemanager/docs/source/tutorials/temp_trn_local/{send_me,send_me_also} temp_trn_remote/]
This looks good, lets go:
[10]:
tr.transfer()
Transferring 2 Files... Done
Now check the “remote” folder to see what it looks like:
[11]:
url.utils.ls('temp_trn_remote')
[11]:
['fetch_me', 'fetch_me_differently', 'fetch_me_too', 'send_me', 'send_me_also']
Seems that the files have been sent as expected
More complex movement¶
You may be aware that rsync cannot handle a many-to-many situation. This is the greatest strength of the Transport
systems. The queuing necessity means that prior to a command execution, logic can be applied and the minimum amount of calls can be made.
In the following example we have 3 files to fetch from the “remote”. Lets assume that we want one to go to a different folder, Transport
handles this for you:
[12]:
tr.queue_for_pull(['fetch_me', 'fetch_me_too'], 'temp_trn_local', 'temp_trn_remote')
url.utils.mkdir('temp_trn_local_different') # create a different target dir for this file
tr.queue_for_pull('fetch_me_differently', 'temp_trn_local_different', 'temp_trn_remote')
Note
Pay close attention to the folder ordering. While we are pulling from the remote, Transport
itself is still a connection from the “local” to the “remote”. Hence, the folder order does not change.
Lets look at our transfers:
[13]:
tr.print_transfers()
transfer 1:
origin: temp_trn_remote/
target: /home/test/remotemanager/docs/source/tutorials/temp_trn_local/
(1/2) fetch_me
(2/2) fetch_me_too
transfer 2:
origin: temp_trn_remote/
target: /home/test/remotemanager/docs/source/tutorials/temp_trn_local_different/
(1/1) fetch_me_differently
and commands
[14]:
tr.transfer(dry_run=True)
[14]:
[rsync -auv --checksum temp_trn_remote/{fetch_me,fetch_me_too} /home/test/remotemanager/docs/source/tutorials/temp_trn_local/,
rsync -auv --checksum temp_trn_remote/fetch_me_differently /home/test/remotemanager/docs/source/tutorials/temp_trn_local_different/]
Now execute, and look into the folders
[15]:
tr.transfer()
Transferring 3 Files in 2 Transfers... Done
[16]:
print(url.utils.ls('temp_trn_local'))
['fetch_me', 'fetch_me_too', 'send_me', 'send_me_also']
[17]:
print(url.utils.ls('temp_trn_local_different'))
['fetch_me_differently']
Looks like all our files have been brought back to the correct place!
Bash Compatibility Mode¶
All Transports can be provided with an argument dir_mode
, either at init (when calling rsync(..., dir_mode=True)
, or on the transfer(dir_mode=True)
. In most cases, you will not have access to the actual transfer
call, so it is best to set it at init, or update it via the dir_mode
property if needed.
If True
, any transfer will have an extra step added where the target files are first copied to a temporary directory, then transferred via *
. This avoids using bash brace expansion to generate the command, who’s behaviour can change on some machines.
Progress¶
If you have used rsync before, you may be aware that there is a --progress
option. This prints a continuous update stream as the files are transferred.
When creating an rsync
object, you can enable this for your terminal by setting progress=True
on the initial call.
We can demonstrate this here using a Dataset
[18]:
from remotemanager import Dataset
from remotemanager.transport import rsync
def f(i):
return i
ds = Dataset(f, transport=rsync(progress=True), skip=False)
ds.append_run({"i": 1})
ds.append_run({"i": 2})
appended run runner-0
appended run runner-1
[19]:
ds.run()
Staging Dataset... Staged 2/2 Runners
Transferring for 2/2 Runners
Transferring 7 Files
sending incremental file list
dataset-a6e26708-master.sh
377 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=6/7)
dataset-a6e26708-repo.py
5.60K 100% 5.34MB/s 0:00:00 (xfr#2, to-chk=5/7)
dataset-a6e26708-repo.sh
444 100% 433.59kB/s 0:00:00 (xfr#3, to-chk=4/7)
dataset-a6e26708-runner-0-jobscript.sh
137 100% 133.79kB/s 0:00:00 (xfr#4, to-chk=3/7)
dataset-a6e26708-runner-0-run.py
987 100% 963.87kB/s 0:00:00 (xfr#5, to-chk=2/7)
dataset-a6e26708-runner-1-jobscript.sh
137 100% 133.79kB/s 0:00:00 (xfr#6, to-chk=1/7)
dataset-a6e26708-runner-1-run.py
987 100% 963.87kB/s 0:00:00 (xfr#7, to-chk=0/7)
sent 9.29K bytes received 149 bytes 18.89K bytes/sec
total size is 8.67K speedup is 0.92
Done
Remotely executing 2/2 Runners
[19]:
True
You can, of course, override this behaviour with verbose=False
[20]:
ds.run(verbose=False, force=True)
[20]:
True
Advanced Usage¶
Contrary to the note regarding the folder order, there exists one further method which inverts the behaviour of the folder ordering. In fact both queueing methods internally call this method, acting as formatters for its arguments.
This method is not intended to be called by the user, but is left as a non-private function for those who prefer its behaviour.
Instead of passing files
, local
, remote
, you must pass files
, origin
, target
, mode
. This takes a file-centric view, and thus for a pull, the origin
is the remote dir. The mode
simply tells Transport
where to put the structures for connecting to the remote, and can either be “push” or “pull”:
[21]:
tr.add_transfer('fetch_me', 'temp_trn_remote', 'temp_trn_local', 'pull')
[22]:
tr.transfer(dry_run=True)
[22]:
[rsync -auv --checksum temp_trn_remote/fetch_me /home/test/remotemanager/docs/source/tutorials/temp_trn_local/]
As you can see, the transfer is created in the intended way, despite the “swapped” folders. You may deem this to be a more sensible use case, and prefer to use it. As the queue functions exist soley to call this function, this should remain a safe method of use for those that wish to use it.
Naming Conventions¶
For reference, the below table sums up the naming convention within the source, for those who want to do further reading:
name |
meaning |
---|---|
local |
“local” folder, regardless of mode of use |
remote |
“remote” folder, regardless of mode of use |
origin |
starting folder for the files; the first folder in an rsync command |
target |
destination folder for the files; the second folder in an rsync command |
Note
Be aware of the argument expansion limitation that exists with rsync versions below version 3. If you get errors during transfer, be sure to check rsync --version
>= 3.